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DETAILED ACTION 
Response to Arguments 

1 . Applicant's arguments filed 12/8/06 have been fully read and considered but they 
are not persuasive. 

The 35 U.S.C. 1 12 rejection is withdrawn after the amendment to claim 36. 

Regarding pages 12-14 of applicant's remarks about claim 37, applicant asserts 
that that limitations of "means for calculating a partial model for each segment...", 
"means for extracting virtual key frames...", and "the three-dimensional coordinates and 
camera pose being derived from the frames of the segment" are not disclosed in Jain. 
The examiner respectfully disagrees. In figure 12, Jain discloses there are multiple 
"image to ground projection" sections that are used to calculate and project an image or 
a partial model for each segment of that includes three-dimensional occupancy 
estimation for which a 3D map of is generated in an attempt to form a dynamic model. 
In column 21, line 63 to column 22, line 7, Jain discloses the obtaining of the feature 
points within the frames. In column 22, line 62 to column 23, line 56, Jain discloses the 
use of equations that includes three dimensional coordinates (x, y, z) that includes 
camera position or pose, camera angle and camera parameter to obtain a partial model 
or a "image to ground projection". Thus, Jain discloses the "means for calculating a 
partial model for each segment..." In column 23, line 58 to column 24, line 3, Jain 
discloses the extraction of key frames by selecting one key frame from every 30 frames 
in that every 30 frames can be considered a segment of a sequence of frames. Thus, 
Jain discloses the "means for extracting virtual key frames..." 
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As for the "the three-dimensional coordinates and camera pose being derived 
from the frames of the segment", Jain discloses there are segments within a sequence 
of frames, otherwise, the ascertainment of key frames would not be possible without 
these segments, where each segment is formed from a sequence of 30 frames. And in 
fig. 12, Jain discloses there are multiple "image to ground projection" sections that are 
used to calculate and project an image or a partial model for each segment of that 
includes three-dimensional occupancy estimation for which a 3D map of is generated in 
an attempt to form a dynamic model. In column 21 , line 63 to column 22, line 7, Jain 
discloses the obtaining of the feature points within the frames. Jain's column 22, line 
62 to column 23, line 56, Jain discloses the use of equations that includes three 
dimensional coordinates (x, y, z) that includes camera position or pose, camera angle 
and camera parameter to obtain a partial model or a "image to ground projection". 

Thus, claim 37 is met by Jain. 

Regarding page 14-1 7 of applicant's remarks, applicant states that the 
combination of Jain and Lee is improper for claims 1,2, 4-9, 11-16 and 18-36. The 
examiner respectfully disagrees. The test is what the combined teachings of the 
references would have suggested to those of ordinary skill in the art. See In re Keller, 
642 F.2d 413, 208 USPQ 871 (CCPA 1981). It has been held that a prior art reference 
must either be in the field of applicant's endeavor or, if not, then be reasonably pertinent 
to the particular problem with which the applicant was concerned, in order to be relied 
upon as a basis for rejection of the claimed invention. See In re Oetiker, 977 F.2d 1443, 
24 USPQ2d 1443 (Fed. Cir. 1992). The examiner recognizes that obviousness can only 
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be established by combining or modifying the teachings of the prior art to produce the 
claimed invention where there is some teaching, suggestion, or motivation to do so 
found either in the references themselves or in the knowledge generally available to one 
of ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 
1988) and In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992). In this case, it 
would have been obvious to one of ordinary skill in the art to combine the teachings of 
Jain and Lee, as a whole, for improving the encoding of video image data so as to 
accurately encode images via the selection of feature points according to the motion of 
objects in a financially robust manner, as disclosed in Lee's column 2, lines 60-64. 

Regarding pages 17-19 of applicant's remarks about claim 1, applicant states 
that Jain and Lee does not disclose "dividing the sequence of frames into frame 
segments wherein the frames... wherein the sequence of frames is divided into frame 
segments based upon frames in each frame segment having at least a minimum 
number of feature points being tracked to at least one base frame in the frame 
segment." The examiner respectfully disagrees. In figure 8, Jain discloses that camera 
1 obtains a sequence of 412 frames for approximately 13 seconds, and that every 30 
frames obtained for each second, ie. the standard NTSC frame rate (30 frames/sec), 
can be considered a frame segment. In this case, camera 1 has approximately 14 
segments, thus, Jain discloses the division of the sequence of images into segments. 
Also, in column 23, line 58 to column 24, line 3, Jain discloses the extraction of key 
frames by selecting one key frame from every 30 frames, ie. a segment of a sequence 
of frames. 
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Clearly, Jain discloses there are segments within a sequence of frames, 
otherwise, the ascertainment of key frames would not be possible without these 
segments, where each segment is formed from a sequence of 30 frames. And in 
fig. 12, Jain discloses there are multiple "image to ground projection" sections that are 
used to calculate and project an image or a partial model for each segment of that 
includes three-dimensional occupancy estimation for which a 3D map of is generated in 
an attempt to form a dynamic model. In column 21 , line 63 to column 22, line 7, Jain 
discloses the obtaining of the feature points within the frames. Jain's column 22, line 
62 to column 23, line 56, Jain discloses the use of equations that includes three 
dimensional coordinates (x, y, z) that includes camera position or pose, camera angle 
and camera parameter to obtain a partial model or a "image to ground projection". 
Jain does not specifically disclose the determining at least a minimum number of feature 
points being tracked. However, in column 2, line 65 to column 3, line 31 , Lee teaches 
the use of threshold values TH and comparison of threshold values of feature points 
between the current frame and the reference frame to check if the threshold is 
exceeded, thus, there is a minimum number of feature points that is determined. Thus, 
Lee teaches the determining at least a minimum number of feature points being tracked. 
Therefore, it would have been obvious to one of ordinary skill in the art to combine the 
teachings of Jain and Lee, as a whole, for improving the encoding of video image data 
so as to accurately encode images via the selection of feature points according to the 
motion of objects in a financially robust manner, as disclosed in Lee's column 2, lines 
60-64. 
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Regarding page 19 of applicant's remarks, applicant states that dependent 
claims 2 and 4-8 are not disclosed by Jain and Lee. The examiner respectfully 
disagrees. Dependent claims 2 and 4-8 are rejected for at least similar reasons as 
claim 1 as stated in the above paragraphs and in the rejection below. 

Regarding pages 19-21 of applicant's remarks about claim 9, applicant contends 
that Jain and Lee do not disclose "a method of recovering a three-dimensional scene 
from two-dimensional images, the method comprising... dividing the sequence of 
frames into segments,... for each segment, encoding the frames in the segment into at 
least two virtual frames that include a three-dimensional structure for the segment and 
an uncertainty associated with the segment... for each of the at least two chosen 
frames, projecting a plurality of three-dimensional points into a corresponding virtual 
frame; and for each of the at least two chosen frames, projecting an uncertainty into the 
corresponding virtual frame". The examiner respectfully disagrees. 

In figure 8, Jain discloses that camera 1 obtains a sequence of 412 frames for 
approximately 13 seconds, and that every 30 frames obtained for each second, ie. the 
standard NTSC frame rate (30 frames/sec), can be considered a frame segment. In 
this case, camera 1 has approximately 14 segments, thus, Jain discloses the division of 
the sequence of images into segments. Also, in column 23, line 58 to column 24, line 3, 
Jain discloses the extraction of key frames by selecting one key frame from every 30 
frames, ie. a segment of a sequence of frames. Thus, Jain discloses "dividing the 
sequence of frames into segments..." 
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As stated in the above paragraphs and in the below rejection, in column 2, line 
65 to column 3, line 31, Lee teaches the use of threshold values TH and comparison of 
threshold values of feature points between the current frame and the reference frame to 
check if the threshold is exceeded, thus, permitting the calculation of percentages of 
feature points in the base or current frame. Thus, Lee teaches a segment has at least a 
predetermined percentage of feature points identified in the base frame. Therefore, it 
would have been obvious to one of ordinary skill in the art to combine the teachings of 
Jain and Lee, as a whole, for improving the encoding of video image data so as to 
accurately encode images via the selection of feature points according to the motion of 
objects in a financially robust manner, as disclosed in Lee's column 2, lines 60-64. 

Clearly, Jain discloses there are segments within a sequence of frames, 
otherwise, the ascertainment of key frames would not be possible without these 
segments, where each segment is formed from a sequence of 30 frames. And in 
fig. 12, Jain discloses there are multiple "image to ground projection" sections that are 
used to calculate and project an image or a partial model for each segment of that 
includes three-dimensional occupancy estimation for which a 3D map of is generated in 
an attempt to form a dynamic model. In column 21 , line 63 to column 22, line 7, Jain 
discloses the obtaining of the feature points within the frames. Jain's column 22, line 
62 to column 23, line 56, Jain discloses the use of equations that includes three 
dimensional coordinates (x, y, z) that includes camera position or pose, camera angle 
and camera parameter to obtain a partial model or a "image to ground projection". 
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In column 23, line 58 to column 24, line 3, Jain discloses the extraction of virtual 
key frames by selecting one key frame from every 30 frames in that every 30 frames 
can be considered a segment of a sequence of frames. In column 24, lines 38-67, Jain 
discloses the virtual key frames are used to obtain the best possible three-dimensional 
reconstruction of the two-dimensional frame data in that if there is not enough known 
points, ie. uncertainty, from virtual key frames, thus, segmented frames are encoded 
into at least two virtual key frames to ascertain the best, possible three-dimensional 
reconstruction of the two-dimensional frame data to yield the 3D visualization. 
Thus, the combination of Jain and Lee discloses "dividing the sequence of frames into 
segments..." The limitation of "encoding the frames in the segment into at least two 
virtual frames... choosing at least two frames..." has already been addressed in the 
above paragraphs and in the rejection below. Peruse the above paragraphs and the 
rejection below. 

Dependent claims 11-16 and 18-22 are rejected for at least similar reasons as 
claim 9. 

Regarding pages 21-23 of applicant's remarks about claim 23, applicant argues 
that Jain and Lee, individually or in combination, does not disclose "determining a 
number of the selected feature points... and if the number of the selected feature points 
from the base frame that are also identified in the next frame is greater than or equal to 
a threshold number... adding the next frame..." The examiner respectfully disagrees. 
In column 2, line 65 to column 3, line 31, Lee teaches the use of threshold values TH 
and comparison of threshold values of feature points between the current frame and the 
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reference frame to check if the threshold is exceeded, thus, permitting the calculation of 
percentages of feature points in the base or current frame. Thus, Lee teaches the 
determining the number of selected feature points from the base frame that are also 
identified in the next frame is greater than or equal to a threshold number. Therefore, it 
would have been obvious to one of ordinary skill in the art to combine the teachings of 
Jain and Lee, as a whole, for improving the encoding of video image data so as to 
accurately encode images via the selection of feature points according to the motion of 
objects in a financially robust manner, as disclosed in column 2, lines 60-64. 

Regarding page 24 of applicant's remarks, applicant argues that claims 24-30 
are not disclosed by the combination of Jain and Lee. The examiner respectfully 
disagrees. Dependent claims 24-30 are rejected for at least similar reasons as stated 
for claim 23 in the above paragraph and the rejection below. 

Regarding pages 24-26 of applicant remarks about claim 31 , applicant contends 
that claim 31 is not disclosed by the combination of Jain and Lee, and that "dividing the 
long sequence into segments includes identifying a base frame and tracking feature 
points... a predetermined threshold of feature points that are contained in the base 
frame". The examiner respectfully disagrees. The limitations of claim 31 are similar to 
claims 1 and 23, and therefore, the issues of the claim 31 has already been addressed 
in the above previous paragraphs and in the rejection below. Peruse above and the 
rejection below. 

Dependent claims 32-35 are rejected for at least similar reasons as claim 31 as 
previously stated in the above paragraphs and in the rejection below. 
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Regarding pages 26-27 of applicant's remarks about claim 36, applicant states 
that Jain does not disclose the limitation "calculating a partial model for each segment, 
wherein the partial model includes the same number of frames as the segment it 
represents... extracting virtual keyframes from each partial model... and bundle 
adjusting the virtual key frames... obtain a complete three-dimensional reconstruction of 
the two dimensional frames." The examiner respectfully disagrees. In fig. 12, Jain 
discloses there are multiple "image to ground projection" sections that are used to 
calculate and project an image or a partial model for each segment of that includes 
three-dimensional occupancy estimation for which a 3D map of is generated in an 
attempt to form a dynamic model. In column 21 , line 63 to column 22, line 7, Jain 
discloses the obtaining of the feature points within the frames. In column 22, line 62 to 
column 23, line 56, Jain discloses the use of equations that includes three dimensional 
coordinates (x, y, z) that includes camera position or pose, camera angle and camera 
parameter to obtain a partial model or a "image to ground projection" and that the 
camera pose does contain rotation and translation, as illustrated by discussion of angle 
and use of three-dimensional coordinates for obtaining rotation and translation. Thus, 
the "calculating" limitation is met. 

In column 23, line 58 to column 24, line 3, Jain discloses the extraction of key 
frames by selecting one key frame from every 30 frames in that every 30 frames can be 
considered a segment of a sequence of frames. Also, in column 24, lines 38-67, Jain 
discloses the key frames are used to obtain the best possible three-dimensional 
reconstruction of the two-dimensional frame data in that if there is not enough known 
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points, ie. uncertainty, from key frames, estimates or bundle adjustments were made to 
ascertain the best, possible three-dimensional reconstruction of the two-dimensional 
frame data to yield the 3D visualization. Thus, the "extracting" limitation is met. 

In figure 12, Jain discloses the "3D visualization" section is the product of the 
adjusting of the virtual key frames to produce a complete three-dimensional 
reconstruction of the two dimensional frames obtained by video camera 1 to video 
camera N. Also, in column 24, lines 38-67, Jain discloses the key frames are used to 
obtain the best possible three-dimensional reconstruction of the two-dimensional frame 
data in that if there is not enough known points from key frames, estimates or bundle 
adjustments were made to ascertain the best, possible three-dimensional reconstruction 
of the two-dimensional frame data to yield the 3D visualization. Thus, the "bundle 
adjusting" limitation is met. 

Jain does not disclose the determination of the partial model including same 
number of frames as the segment it represents. However, in column 2, line 65 to 
column 3, line 31 , Lee teaches the use of threshold values TH and comparison of 
threshold values of feature points between the current frame and the reference frame to 
check if the threshold is exceeded, thus, a comparison is done to see if the partial 
model includes the same number of frames as the segment it represents. Thus, Lee 
discloses the determination of the partial model including same number of frames as the 
segment it represents. Therefore, it would have been obvious to one of ordinary skill in 
the art to combine the teachings of Jain and Lee, as a whole, for improving the 
encoding of video image data so as to accurately encode images via the selection of 
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feature points according to the motion of objects in a financially robust manner, as 
disclosed in Lee's column 2, lines 60-64. 

In conclusion, the rejection of the claims is maintained. 

Claim Rejections - 35 USC § 102 

1 . The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

2. Claim 37 is rejected under 35 U.S.C. 102(b) as being anticipated by Jain et al 
(5,729,471). 

Regarding claim 37, Jain discloses an apparatus for recovering a three- 
dimensional scene from a sequence of two-dimensional frames by segmenting the 
frames, comprising: 

means for capturing two-dimensional images (fig. 12, note camera 1 obtain video 
images in two-dimensional form; also see fig. 8, note camera 1 obtains a sequence of 
two-dimensional images, and cameras 2 and 3 also obtain a corresponding sequence of 
images; col. 22, In. 1-3); 

means for dividing the sequence into segments (fig. 8, note that camera 1 obtains 
a sequence of 412 frames for approximately 13 seconds, and that every 30 frames 
obtained for each second, ie. the standard NTSC frame rate (30 frames/sec), can be 
considered a segment, so in this case, camera 1 has approximately 14 segments, thus, 
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Jain discloses the division of the sequence of images into segments; also, in col.23, 
In. 58 to col.24, In. 3; Jain discloses the extraction of key frames by selecting one key 
frame from every 30 frames, ie. a segment of a sequence of frames, clearly, Jain 
discloses there are segments within a sequence of frames, otherwise, the 
ascertainment of key frames would not be possible without these segments, where each 
segment is formed from a sequence of 30 frames); 

means for calculating a partial model for each segment that includes three- 
dimensional coordinates and camera pose for features within the frames of the 
segment, the three-dimensional coordinates and camera pose being derived from the 
frames of the segment (fig. 12, note there are multiple "image to ground projection" 
sections that are used to calculate and project an image or a partial model for each 
segment of that includes three-dimensional occupancy estimation for which a 3D map of 
is generated in an attempt to form a dynamic model; col.21 , In. 63 to col. 22, In. 7, Jain 
discloses the obtaining of the feature points within the frames; col. 22, In. 62 to col.23, 
ln.56, Jain discloses the use of equations that includes three dimensional coordinates 
(x, y, z) that includes camera position or pose, camera angle and camera parameter to 
obtain a partial model or a "image to ground projection"); 

means for extracting virtual key frames from each partial model (col.23, In. 58 to 
col.24, In. 3; Jain discloses the extraction of key frames by selecting one key frame from 
every 30 frames in that every 30 frames can be considered a segment of a sequence of 
frames); and 
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means for bundle adjusting the virtual key frames to obtain a complete three- 
dimensional reconstruction of the two-dimensional frames (fig. 12, note the "3D 
visualization" section is the product of the adjusting of the virtual key frames to produce 
a complete three-dimensional reconstruction of the two dimensional frames obtained by 
video camera 1 to video camera N; also, col.24, ln.38-67, Jain discloses the key frames 
are used to obtain the best possible three-dimensional reconstruction of the two- 
dimensional frame data in that if there is not enough known points from key frames, 
estimates or bundle adjustments were made to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization). 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 1-2, 4-9, 11-16 and 18-36 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Jain et al (5,729,471) in view of Lee (5,612,743). 

Regarding claim 1 , Jain discloses a method of recovering a three-dimensional 
scene from two-dimensional images, the method comprising: 

providing a sequence of frames (fig. 12, note camera 1 obtain video images in 
two-dimensional form; also see fig.8, note camera 1 obtains a sequence of two- 
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dimensional images, and cameras 2 and 3 also obtain a corresponding sequence of 
images; col.22 f In. 1-3); 

dividing the sequence of frames into frame segments wherein the frames in the 
sequence comprise feature points and wherein the sequence of frames is divided into 
frame segments (fig.8, note that camera 1 obtains a sequence of 412 frames for 
approximately 13 seconds, and that every 30 frames obtained for each second, ie. the 
standard NTSC frame rate (30 frames/sec), can be considered a frame segment, so in 
this case, camera 1 has approximately 14 segments, thus, Jain discloses the division of 
the sequence of images into segments; also, in col.23, In. 58 to col.24, ln.3; Jain 
discloses the extraction of key frames by selecting one key frame from every 30 frames, 
ie. a segment of a sequence of frames, clearly, Jain discloses there are segments within 
a sequence of frames, otherwise, the ascertainment of key frames would not be 
possible without these segments, where each segment is formed from a sequence of 30 
frames; also fig. 12, note there are multiple "image to ground projection" sections that 
are used to calculate and project an image or a partial model for each segment of that 
includes three-dimensional occupancy estimation for which a 3D map of is generated in 
an attempt to form a dynamic model; col. 21, ln.63 to col. 22, ln.7, Jain discloses the 
obtaining of the feature points within the frames; col.22, In. 62 to col.23, In. 56, Jain 
discloses the use of equations that includes three dimensional coordinates (x, y, z) that 
includes camera position or pose, camera angle and camera parameter to obtain a 
partial mode! or a "image to ground projection"); 
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performing three-dimensional reconstruction individually for each frame segment 
derived by dividing the sequence of frames (fig. 12, note there are multiple "image to 
ground projection" sections that are used to calculate and project an image or a partial 
model for each segment of that includes three-dimensional occupancy estimation for 
which a 3D map of is generated in an attempt to form a dynamic model; col.21 , In. 63 to 
col.22, In. 7, Jain discloses the obtaining of the feature points within the frames; col. 22, 
ln.62 to col.23 t In. 56, Jain discloses the use of equations that includes three 
dimensional coordinates (x, y, z) that includes camera position or pose, camera angle 
and camera parameter to obtain a partial model or a "image to ground projection"); and 

combining the three-dimensional reconstructed segments together to recover a 
three-dimensional scene for the sequence of images (fig. 12, note the "3D visualization" 
section is the. product of the adjusting of the virtual key frames to produce a complete 
three-dimensional reconstruction of the two dimensional frames obtained by video 
camera 1 to video camera N; also, col.24, ln.38-67, Jain discloses the key frames are 
used to obtain the best possible three-dimensional reconstruction of the two- 
dimensional frame data in that if there is not enough known points from key frames, 
estimates or bundle adjustments were made to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization). 

Jain does not specifically disclose the determining at least a minimum number of 
feature points being tracked. However, Lee teaches the determining at least a minimum 
number of feature points being tracked (col.2, ln.65 to col.3, ln.31; Lee teaches the use 
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of threshold values TH and comparison of threshold values of feature points between 
the current frame and the reference frame to check if the threshold is exceeded, thus, 
there is a minimum number of feature points that is determined). Therefore, it would 
have been obvious to one of ordinary skill in the art to combine the teachings of Jain 
and Lee, as a whole, for improving the encoding of video image data so as to accurately 
encode images via the selection of feature points according to the motion of objects in a 
financially robust manner (col. 2, ln.60-64). 

Regarding claim 2, Jain discloses the use of virtual key frames (col.23, In. 58 to 
col.24, ln.3; Jain discloses the extraction of key frames by selecting one key frame from 
every 30 frames, ie. a segment of a sequence of frames). 

Regarding claim 4, Jain discloses the performance of a two-frame structure from 
motion algorithm on each of the segments to create a partial model (fig. 12, note there 
are multiple "image to ground projection" sections that are used to calculate and project 
an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col.21, ln.63 to col.22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col.22, ln.62 to col.23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"); and eliminating ambiguity (col.24, ln.38-67, Jain discloses 
the virtual key frames are used to obtain the best possible three-dimensional 
reconstruction of the two-dimensional frame data in that if there is not enough known 
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points, ie. uncertainty, from virtual key frames, thus, segmented frames are encoded 
into at least two virtual key frames to ascertain the best, possible three-dimensional 
reconstruction of the two-dimensional frame data to yield the 3D visualization). 

Regarding claims 5 and 7, Jain discloses extracting virtual key frames (col. 23, 
In. 58 to col. 24, In. 3; Jain discloses the extraction of key frames by selecting one key 
frame from every 30 frames in that every 30 frames can be considered a segment of a 
sequence of frames; also, col.24, In. 38-67, Jain discloses the keyframes are used to 
obtain the best possible three-dimensional reconstruction of the two-dimensional frame 
data in that if there is not enough known points, ie. uncertainty, from key frames, 
estimates or bundle adjustments were made to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization) and bundle adjustment of key frames (fig. 12, note the "3D visualization" 
section is the product of the adjusting of the virtual key frames to produce a complete 
three-dimensional reconstruction of the two dimensional frames obtained by video 
camera 1 to video camera N; also, col.24, In. 38-67, Jain discloses the key frames are 
used to obtain the best possible three-dimensional reconstruction of the two- 
dimensional frame data in that if there is not enough known points from key frames, 
estimates or bundle adjustments were made to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization). 

Regarding claim 6, Jain discloses identify feature points, estimating three 
dimensional coordinates, and estimating camera rotation and translation (fig. 12, note 
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there are multiple "image to ground projection" sections that are used to calculate and 
project an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col.21, In. 63 to col.22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col.22, In. 62 to col.23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"). 

Regarding claim 8, Jain discloses the use of a computer-readable medium to 
execute instructions for performing the method of claim 1 (col.15, In. 65-67). 

Regarding claims 9 and 21, discloses a method of recovering a three- 
dimensional scene from two-dimensional images, the method comprising: 

identifying a sequence of two-dimensional frames that include two-dimensional 
images (fig. 12, note camera 1 obtain video images in two-dimensional form; also see 
fig. 8, note camera 1 obtains a sequence of two-dimensional images, and cameras 2 
and 3 also obtain a corresponding sequence of images; col.22, In. 1-3); 

dividing the sequence of images into segments, wherein a segment includes a 
plurality of frames (fig. 8, note that camera 1 obtains a sequence of 412 frames for 
approximately 13 seconds, and that every 30 frames obtained for each second, ie. the 
standard NTSC frame rate (30 frames/sec), can be considered a segment, so in this 
case, camera 1 has approximately 14 segments, thus, Jain discloses the division of the 
sequence of images into segments; also, in col.23, In. 58 to col.24, ln.3; Jain discloses 
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the extraction of key frames by selecting one key frame from every 30 frames, ie. a 
segment of a sequence of frames, clearly, Jain discloses there are segments within a 
sequence of frames, otherwise, the ascertainment of key frames would not be possible 
without these segments, where each segment is formed from a sequence of 30 frames) 
and wherein dividing includes, identifying the base frame, identifying the feature points 
in the base frame, and defining the segments (col. 21 , ln.63 to col.22, ln.7; Jain 
discloses the identification of feature points in the plural frames that includes the first 
base frame in the segments from the sequence of images); 

for each segment, encoding the frames in the segment into at least two virtual 
frames that include a three-dimensional structure for the segment and an uncertainty 
associated with the segment and wherein encoding includes choosing at least two 
frames (col. 23, ln.58 to col.24, In. 3; Jain discloses the extraction of virtual key frames by 
selecting one key frame from every 30 frames in that every 30 frames can be 
considered a segment of a sequence of frames; also, col.24, ln.38-67, Jain discloses 
the virtual key frames are used to obtain the best possible three-dimensional 
reconstruction of the two-dimensional frame data in that if there is not enough known 
points, ie. uncertainty, from virtual keyframes, thus, segmented frames are encoded 
into at least two virtual key frames to ascertain the best, possible three-dimensional 
reconstruction of the two-dimensional frame data to yield the 3D visualization); 

projecting a plurality of three dimensional points into a corresponding virtual 
frame (also, col.24, ln.38-67, Jain discloses the virtual key frames are used to obtain the 
best possible three-dimensional reconstruction of the two-dimensional frame data in that 
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if there is not enough known points, ie. uncertainty, from virtual key frames, thus, 
segmented frames are encoded into at least two virtual key frames to ascertain the 
best, possible three-dimensional reconstruction of the two-dimensional frame data to 
yield the 3D visualization); and 

projecting an uncertainty into the corresponding virtual frame (also, col. 24, In. 38- 
67, Jain discloses the virtual key frames are used to obtain the best possible three- 
dimensional reconstruction of the two-dimensional frame data in that if there is not 
enough known points, ie. uncertainty, from virtual key frames, thus, segmented frames 
are encoded into at least two virtual key frames to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization). 

Jain does not specifically disclose determining the segments such that every 
frame in a segment has at least a predetermined percentage of feature points identified 
in the base frame. However, Lee teaches a segment has at least a predetermined 
percentage of feature points identified in the base frame (col.2, ln.65 to col.3, ln.31; Lee 
teaches the use of threshold values TH and comparison of threshold values of feature 
points between the current frame and the reference frame to check if the threshold is 
exceeded, thus, permitting the calculation of percentages of feature points in the base 
or current frame). Therefore, it would have been obvious to one of ordinary skill in the 
art to combine the teachings of Jain and Lee, as a whole, for improving the encoding of 
video image data so as to accurately encode images via the selection of feature points 
according to the motion of objects in a financially robust manner (col.2, ln.60-64). 
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Regarding claim 1 1 , Jain discloses the variation of segments and variation of 
frames (fig.8, note camera 1 has multiple 413 frames in approximately 13 seconds, 
where each segment has 30 frames to obtain approximately 13 segments from camera 
1 , whereas camera 2 has 181 frames in 6 seconds, or approximately 6 segments from 
camera 2, etc.). 

Regarding claim 12, Jain discloses identify feature points, estimating three 
dimensional coordinates, and estimating camera rotation and translation (fig. 12, note 
there are multiple "image to ground projection" sections that are used to calculate and 
project an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col.21 , ln.63 to col.22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col.22, ln.62 to col.23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"). 

Regarding claims 13-16, Jain discloses the performance of a two-frame structure 
from motion algorithm on each of the segments to create a partial model (fig. 12, note 
there are multiple "image to ground projection" sections that are used to calculate and 
project an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col.21 , ln.63 to col.22, ln.7, Jain discloses the obtaining of the feature 



Application/Control Number: 09/338,176 Page 23 

Art Unit: 2621 

points within the frames; col.22, ln.62 to col.23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"); and eliminating ambiguity (col.24, In. 38-67, Jain discloses 
the virtual key frames are used to obtain the best possible three-dimensional 
reconstruction of the two-dimensional frame data in that if there is not enough known 
points, ie. uncertainty, from virtual key frames, thus, segmented frames are encoded 
into at least two virtual key frames to ascertain the best, possible three-dimensional 
reconstruction of the two-dimensional frame data to yield the 3D visualization). 

Regarding claim 18, Jain discloses extracting virtual keyframes (col.23, ln.58 to 
col.24, In. 3; Jain discloses the extraction of key frames by selecting one key frame from 
every 30 frames in that every 30 frames can be considered a segment of a sequence of 
frames; also, col.24, In. 38-67, Jain discloses the keyframes are used to obtain the best 
possible three-dimensional reconstruction of the two-dimensional frame data in that if 
there is not enough known points, ie. uncertainty, from key frames, estimates or bundle 
adjustments were made to ascertain the best, possible three-dimensional reconstruction 
of the two-dimensional frame data to yield the 3D visualization) and bundle adjustment 
of key frames (fig. 12, note the "3D visualization" section is the product of the adjusting 
of the virtual key frames to produce a complete three-dimensional reconstruction of the 
two dimensional frames obtained by video camera 1 to video camera N; also, col.24, 
ln.38-67, Jain discloses the key frames are used to obtain the best possible three- 
dimensional reconstruction of the two-dimensional frame data in that if there is not 
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enough known points from key frames, estimates or bundle adjustments were made to 
ascertain the best, possible three-dimensional reconstruction of the two-dimensional 
frame data to yield the 3D visualization). 

Regarding claim 19, Jain discloses performing motion estimation to identify 
feature points (col.21 , ln.63 to col.22, ln.7). 

Regarding claim 20, Jain does not specifically disclose creating a template block 
in a first frame, creating a search window used in the second frame, and comparing an 
intensity difference between the search window and the template block to locate the 
feature point in the second frame. However, Lee teaches that creating a template block 
in a first frame, creating a search window used in the second frame, and comparing an 
intensity difference between the search window and the template block to locate the 
feature point in the second frame (fig.4, note frame A and frame B are the first and 
second frames, note fig. 3, element 313 also discloses the comparison process to 
compare differences to determine or locate the feature point in the second frame). 
Therefore, it would have been obvious to one of ordinary skill in the art to combine the 
teachings of Jain and Lee, as a whole, for improving the encoding of video image data 
so as to accurately encode images via the selection of feature points according to the 
motion of objects in a financially robust manner (col.2, In. 60-64). 

Regarding claim 22, Jain discloses the use of a computer-readable medium to 
execute instructions for performing the method of claim 9 (col. 15, In. 65-67). 

Regarding claims 23, 24 and 28, Jain discloses a method of recovering a three- 
dimensional scene from a sequence of two-dimensional frames, comprising: 
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identifying at least a first base frame in a sequence of two dimensional frames 
(fig. 12, note camera 1 obtain video images in two-dimensional form; also see fig. 8, note 
camera 1 obtains a sequence of two-dimensional images, and cameras 2 and 3 also 
obtain a corresponding sequence of images; see col.22, In. 1-3; fig. 8, note that camera 1 
obtains a sequence of 412 frames for approximately 13 seconds, and that every 30 
frames obtained for each second, ie. the standard NTSC frame rate (30 frames/sec), 
can be considered a segment, so in this case, camera 1 has approximately 14 
segments, thus, Jain discloses the division of the sequence of images into segments; 
also, in col.23, In. 58 to col.24, In. 3; Jain discloses the extraction of key frames by 
selecting one key frame from every 30 frames, ie. a segment of a sequence of frames, 
clearly, Jain discloses there are segments within a sequence of frames, otherwise, the 
ascertainment of key frames would not be possible without these segments, where each 
segment is formed from a sequence of 30 frames); 

adding the at least first base frame to create a first segment of frames of the 
sequence (fig. 8, note that camera 1 obtains a sequence of 412 frames for 
approximately 13 seconds, and that every 30 frames obtained for each second, ie. the 
standard NTSC frame rate (30 frames/sec), can be considered a frame segment, so in 
this case, camera 1 has approximately 14 frame segments, so a first segment of the 
sequence is created); 

selecting feature points in at least a first base frame in a first segment of frames 
in the sequence (col.21, ln.63 to col.22, ln.7; Jain discloses the identification and 
selection of feature points in the plural frames that includes the first base frame); and 
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analyzing a second frame in the segment to identify the feature points in the 
second frame (col.21, ln.63 to col.22, ln.7; Jain discloses the identification of feature 
points in each frame from a plurality of frames that includes the second frame). 

Jain does not specifically disclose the adding the second frame to the segment. 
However, Jain discloses the manual adjustment of the number of key frames, where the 
number is one key frame for every thirty frames, ie. a segment (col.23, ln.64 to col.24, 
In. 3). Therefore, since Jain teaches the manual adjustment of one key frame or 
representative frame for every thirty frames, it would have been obvious to one of 
ordinary skill in the art to manually change the number of key (representative) frames 
per segment from anywhere between two to five key or representative frames per 
segment if necessary for accurately enhancing the three-dimensional representation of 
the targeted scene. 

Jain does not specifically disclose the determining the number of selected feature 
points from the base frame that are also identified in the next frame is greater than or 
equal to a threshold number. However, Lee teaches the determining the number of 
selected feature points from the base frame that are also identified in the next frame is 
greater than or equal to a threshold number (col.2, ln.65 to col.3, ln.31 ; Lee teaches the 
use of threshold values TH and comparison of threshold values of feature points 
between the current frame and the reference frame to check if the threshold is 
exceeded, thus, permitting the calculation of percentages of feature points in the base 
or current frame). Therefore, it would have been obvious to one of ordinary skill in the 
art to combine the teachings of Jain and Lee, as a whole, for improving the encoding of 
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video image data so as to accurately encode images via the selection of feature points 
according to the motion of objects in a financially robust manner (col. 2, ln.60-64). 

Regarding claim 25, Jain discloses performing motion estimation to identify 
feature points (col.21, ln.63 to col.22, ln.7). 

Regarding claim 26, Jain discloses the identification of corners as feature points 
(col.22, In. 15-22; note the disclosure of borders, hashlines, marks are feature points to 
create corners as to determine camera status and pose). 

Regarding claim 27, Jain discloses the number of frames can vary between 
segments (col.23, ln.64 to col.24, ln.3). 

Regarding claim 29, Jain discloses the bundle adjustment of key frames (fig. 12, 
note the "3D visualization" section is the product of the adjusting of the virtual key 
frames to produce a complete three-dimensional reconstruction of the two dimensional 
frames obtained by video camera 1 to video camera N; also, col.24, In. 38-67, Jain 
discloses the key frames are used to obtain the best possible three-dimensional 
reconstruction of the two-dimensional frame data in that if there is not enough known 
points from key frames, estimates or bundle adjustments were made to ascertain the 
best, possible three-dimensional reconstruction of the two-dimensional frame data to 
yield the 3D visualization). 

Regarding claim 30, Jain discloses the use of a computer-readable medium to 
execute instructions for performing the method of claim 23 (col. 15, In. 65-67). 

Regarding claim 31, Jain discloses a method of recovering a three-dimensional 
scene from a sequence of two-dimensional frames (fig. 12), an improvement comprising: 
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dividing a long sequence of frames into segments (fig.8, note that camera 1 
obtains a sequence of 412 frames for approximately 13 seconds, and that every 30 
frames obtained for each second, ie. the standard NTSC frame rate (30 frames/sec), 
can be considered a segment, so in this case, camera 1 has approximately 14 
segments, thus, Jain discloses the division of the sequence of images into segments; 
also, in col.23, In. 58 to col.24, In. 3; Jain discloses the extraction of key frames by 
selecting one key frame from every 30 frames, ie. a segment of a sequence of frames, 
clearly, Jain discloses there are segments within a sequence of frames, otherwise, the 
ascertainment of key frames would not be possible without these segments, where each 
segment is formed from a sequence of 30 frames), 

wherein the representative frames are used to recover the three-dimensional 
scene and remaining frames are discarded so that three-dimensional scene is 
effectively compressed (col.23, In. 58 to col.24, In. 3; Jain discloses the extraction of 
virtual key frames by selecting one key frame from every 30 frames in that every 30 
frames can be considered a segment of a sequence of frames; also, col.24, In. 38-67, 
Jain discloses the virtual key frames are used to obtain the best possible three- 
dimensional reconstruction of the two-dimensional frame data in that if there is not 
enough known points, ie. uncertainty, from virtual key frames, thus, segmented frames 
are encoded into at least two virtual key frames to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization, the excess remaining frames are discarded), 
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wherein dividing the long sequence into segments includes identifying a base 
frame and tracking feature points between frames in the sequence (col.21 , In. 63 to 
col.22, In. 7; Jain discloses the identification of feature points in the plural frames that 
includes the first base frame in the segments from the sequence of images). 

Jain does not specifically disclose the reducing the number of frames in each 
segment by representing the segments using between two and five representative 
frames per segment. However, Jain discloses the manual adjustment of the number of 
key frames, where the number is one key frame for every thirty frames, ie. a segment 
(col. 23, In. 64 to col. 24, In. 3). Therefore, since Jain teaches the manual adjustment of 
one key frame or representative frame for every thirty frames, it would have been 
obvious to one of ordinary skill in the art to manually change the number of key 
(representative) frames per segment from anywhere between two to five key or 
representative frames per segment if necessary for accurately enhancing the three- 
dimensional representation of the targeted scene. 

Jain does not disclose a predetermined threshold of feature points that are 
contained in the base frame. However, Lee teaches the predetermined threshold of 
feature points that are contained in the base frame (col. 2, ln.65 to col.3, In. 31; Lee 
teaches the use of threshold values TH and comparison of threshold values of feature 
points between the current frame and the reference frame to check if the threshold is 
exceeded). Therefore, it would have been obvious to one of ordinary skill in the art to 
combine the teachings of Jain and Lee, as a whole, for improving the encoding of video 



Application/Control Number: 09/338,176 Page 30 

Art Unit: 2621 

image data so as to accurately encode images via the selection of feature points 
according to the motion of objects in a financially robust manner (col.2, ln.60-64). 

Regarding claim 32, Jain discloses that each representative frame have an 
associated uncertainty (col. 24, In. 38-67, Jain discloses the keyframes are used to 
obtain the best possible three-dimensional reconstruction of the two-dimensional frame 
data in that if there is not enough known points, ie. uncertainty, from key frames, 
estimates or bundle adjustments were made to ascertain the best, possible three- 
dimensional reconstruction of the two-dimensional frame data to yield the 3D 
visualization). 

Regarding claim 33, Jain discloses the long sequence of frames includes over 75 
frames (fig. 8, note that camera 1 obtains a sequence of 412 frames, which clearly is 
over 75 frames). 

Regarding claim 34, Jain discloses the division of the long sequence into 
segments and tracking feature points (fig. 8, note that camera 1 obtains a sequence of 
412 frames for approximately 13 seconds, and that every 30 frames obtained for each 
second, ie. the standard NTSC frame rate (30 frames/sec), can be considered a 
segment, so in this case, camera 1 has approximately 14 segments, thus, Jain 
discloses the division of the sequence of images into segments; also, in col. 23, ln.58 to 
col.24, In. 3; Jain discloses the extraction of key frames by selecting one key frame from 
every 30 frames, ie. a segment of a sequence of frames, clearly, Jain discloses there 
are segments within a sequence of frames, otherwise, the ascertainment of key frames 
would not be possible without these segments, where each segment is formed from a 
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sequence of 30 frames; col.21, ln.63 to col.22, ln.7, Jain discloses the obtaining of the 
feature points within the frames). 

Regarding claim 35, Jain discloses the performance of a two-frame structure 
from motion algorithm on each of the segments to create a partial model (fig. 12, note 
there are multiple "image to ground projection" sections that are used to calculate and 
project an image or a partial model for each segment of that includes three-dimensional 
occupancy estimation for which a 3D map of is generated in an attempt to form a 
dynamic model; col.21, ln.63 to col.22, ln.7, Jain discloses the obtaining of the feature 
points within the frames; col.22, ln.62 to col.23, ln.56, Jain discloses the use of 
equations that includes three dimensional coordinates (x, y, z) that includes camera 
position or pose, camera angle and camera parameter to obtain a partial model or a 
"image to ground projection"). 

Regarding claim 36, Jain discloses a computer-readable medium having 
computer-executable instructions for performing a method comprising: 

providing a sequence of two-dimensional frames (fig. 12, note camera 1 obtain 
video images in two-dimensional form; also see fig. 8, note camera 1 obtains a 
sequence of two-dimensional images, and cameras 2 and 3 also obtain a corresponding 
sequence of images; col.22, In. 1-3); 

dividing the sequence into segments (fig.8, note that camera 1 obtains a 
sequence of 412 frames for approximately 13 seconds, and that every 30 frames 
obtained for each second, ie. the standard NTSC frame rate (30 frames/sec), can be 
considered a segment, so in this case, camera 1 has approximately 14 segments, thus, 
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Jain discloses the division of the sequence of images into segments; also, in col. 23, 
In. 58 to col.24, In. 3; Jain discloses the extraction of key frames by selecting one key 
frame from every 30 frames, ie. a segment of a sequence of frames, clearly, Jain 
discloses there are segments within a sequence of frames, otherwise, the 
ascertainment of key frames would not be possible without these segments, where each 
segment is formed from a sequence of 30 frames); 

calculating a partial model for each segment that includes three-dimensional 
coordinates and camera pose for features within the frames, the camera pose 
comprising rotation and translation (fig. 12, note there are multiple "image to ground 
projection" sections that are used to calculate and project an image or a partial model 
for each segment of that includes three-dimensional occupancy estimation for which a 
3D map of is generated in an attempt to form a dynamic model; col.21 , ln.63 to col.22, 
In. 7, Jain discloses the obtaining of the feature points within the frames; col.22, ln.62 to 
col. 23, In. 56, Jain discloses the use of equations that includes three dimensional 
coordinates (x, y, z) that includes camera position or pose, camera angle and camera 
parameter to obtain a partial model or a "image to ground projection" and that the 
camera pose does contain rotation and translation, as illustrated by discussion of angle 
and use of three-dimensional coordinates for obtaining rotation and translation); 

extracting virtual key frames from each partial model, the virtual key frames 
having three-dimensional coordinates for the frames and an uncertainty associated with 
the frames (col.23, ln.58 to col.24, ln.3; Jain discloses the extraction of key frames by 
selecting one key frame from every 30 frames in that every 30 frames can be 
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considered a segment of a sequence of frames; also, col. 24, ln.38-67, Jain discloses 
the key frames are used to obtain the best possible three-dimensional reconstruction of 
the two-dimensional frame data in that if there is not enough known points, ie. 
uncertainty, from key frames, estimates or bundle adjustments were made to ascertain 
the best, possible three-dimensional reconstruction of the two-dimensional frame data 
to yield the 3D visualization); and 

bundle adjusting the virtual key frames to obtain a complete three-dimensional 
reconstruction of the two-dimensional frames (fig. 12, note the "3D visualization" section 
is the product of the adjusting of the virtual key frames to produce a complete three- 
dimensional reconstruction of the two dimensional frames obtained by video camera 1 
to video camera N; also, col.24, ln.38-67, Jain discloses the key frames are used to 
obtain the best possible three-dimensional reconstruction of the two-dimensional frame 
data in that if there is not enough known points from key frames, estimates or bundle 
adjustments were made to ascertain the best, possible three-dimensional reconstruction 
of the two-dimensional frame data to yield the 3D visualization). 

Jain does not disclose the determination of the partial model including same 
number of frames as the segment said partial model represents. However, Lee 
discloses the determination of the partial model including same number of frames as the 
segment it represents (col. 2, In. 65 to col. 3, ln.31; Lee teaches the use of threshold 
values TH and comparison of threshold values of feature points between the current 
frame and the reference frame to check if the threshold is exceeded, thus, a comparison 
is done to see if the partial model includes the same number of frames as the segment 
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it represents). Therefore, it would have been obvious to one of ordinary skill in the art to 
combine the teachings of Jain and Lee, as a whole, for improving the encoding of video 
image data so as to accurately encode images via the selection of feature points 
according to the motion of objects in a financially robust manner (col. 2, ln.60-64). 

Conclusion 

5. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 . 1 36(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Contact Information 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Allen Wong whose telephone number is (571 ) 272-7341 . 
The examiner can normally be reached on Mondays to Thursdays from 8am-6pm 
Flextime. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, James J. Groody can be reached on (571) 272-7418. The fax phone 
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